home *** CD-ROM | disk | FTP | other *** search
- Path: nntp.teleport.com!sschaem
- From: sschaem@teleport.com (Stephan Schaem)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: Amiga doesn`t need Pl
- Date: 1 Mar 1996 00:01:29 GMT
- Organization: Teleport - Portland's Public Access (503) 220-1016
- Distribution: world
- Message-ID: <4h5eop$hvl@maureen.teleport.com>
- References: <john.hendrikx.4hkq@grafix.xs4all.nl>
- NNTP-Posting-Host: linda.teleport.com
- X-Newsreader: TIN [version 1.2 PL2]
-
- John Hendrikx (john.hendrikx@grafix.xs4all.nl) wrote:
- : In a message of 24 Feb 96 Stephan Schaem wrote to All:
-
- : SS> The cpu is not the problem, the amiga HW (From CBM) sux big time.
- : SS> The CPU is not the problem... and 030 can render doom in 'fastmem'
- : SS> 'easy'.
-
- : I don't think so, maybe the 50 MHz version, but they still won't get 25 FPS
- : orso at 320x256 1x1 (talking DOOM here, not some WolfenStein clone with floors
- : which I see all too often).
-
- rendering a static scene should acheive 25fps in 320x200 (60hz) with
- a 25mhz 030 (50% walls 50% floor/ceiling). I'm just talking about rendering
- it to fastmem, no c2p into chipmem... I just wanted to point out that
- the CPU is not the problem in the amiga to play something like doom.
- (Well 14mhz 020 are)
-
- : SS> The killing factor is the slow video memory. an 030 compete easy with
- : SS> a 486 in integer operation (mhz for mhz, not on an inst basis but on
- : SS> an overall small cached loop... like tmap)
-
- : Do you really think so? On 030 the fastest instructions available take 2
- : cycles, while most instructions on 486 take 1 cycle. 486 also has much faster
- : Mul and Div instructions. You would be better of comparing the 040 with the
- : 486.
-
- well, the 68030 as 16 'general' purpose register (Let not argue on the
- details:) as a separate data/inst cache . Does the 486 halt during buss
- access? Do you know how many cycles are needed to tmap a gouraud lighted
- pixel on a 486? on a 030 is 28cycle or so.
- also this take 6 cycle on a 030: addi.l #32bitvalue,(localvariable,a7)
- how many on a 486 and 386? (Sorry I dont have any x86 stuff:( )
-
- : >> Yes it does, see TextDemo. The percentage of CPU time used for the C2P
- : >> is NON-EXISTANT on the clones, because the 'fast-ram buffer' we use on
- : >> Amiga is called 'the screen' on the clones. No extra copying (or
- : >> converting for that matter) needed.
-
- : SS> Again the problem is not c2p but slow video memory...Does PC alway
- : SS> cache video memory on the L1 cache? I hear many people rendering in
- : SS> local mem then doing a copy.
-
- : Of course the video memory is not cached in the L1 cache, for the same reason
- : as ChipRAM isn't cached on Amiga. To copy the stuff to video ram why not
- : simply ask the DMA controller to copy that shit for you while you render the
- : next frame? Also why wouldn't the same trick to get 'free' cycles on Amiga
- : while doing ChipRAM writes work with the clones much faster Video RAM? While
- : writing the pixel to video ram the processor continues to calculate the next
- : TMapped pixel.
-
- I'm very unfamiliar with PC... but I heard that the PC dma is dead slow
- like 4meg second to dma system memory... does all PC video card have DMA?
- And I also heard that the PC dma block the bus during access.
- I actually dont know if a 386 or 486 can continue execution while bus
- write a pending (040 upto 3, how many on the 486?)
-
- If you use dma and render in local mem on PC, you can say its diferent
- then rendering in fast on the amiga and call it "the screen"...
-
- And if you do byte access directly to the video card memory dont you
- think it would be slower then in the L1 cache + the copy?
-
- : >> That's TextDemo 5.7x (unreleased version) someone tested for me. 15-20
- : >> FPS for a 68060/50 which is supposed to be 2-3 times as powerfull as a
- : >> 486DX2/50 is quite depressing, considering that that 486 will do it at 30
- : >> FPS. Now just translate that to the slower Amiga's (ie, the ones only
- : >> equipped with 030's and 040's).
-
- : SS> 1.2 meg, around 15 frames second used to copy the fastmem buffer to
- : SS> chip. So 100mips*75% / 320*200*20 = 58.5 cycle per pixel rendered
- : SS> in 060 local mem! that is HUGE! when you know that a 040 need ~10
- : SS> cycle per pixel to do floor/ceiling gouraud shaded texture mapping.
-
- : I doubt this 10 cycle routine of yours is very usefull for realistic purposes
- : judging from all the 'unrealistic' TMap routines I've seen here lately (ones
- : with rely on 64K boundaries or too big or too small Textures).
-
- It texture anything using 64K tmap with 24bit/16bit/8bit fixpoint
- and gouraud lighting. I do not see why using segment is 'unrealistic' ?!
- The limitation, you cant have texture on a single polygon bigger then
- 256x256 pixel in 256 colors.(But you can subdivide the polygon and use
- x tmap, my guess its something rare) Thats it, you can even use subpixel
- U/V upto 65536 subdivition per texel :)
- I'm already using the inner loop in a test case (zom/rot of an image).
- When I finish my full quatradic interpolation with subdivition polygon
- render routine I will integrate that inner loop (its designed around it).
- (This loop as been proposed here some days ago, and I think as already been
- integrated and tested in a 3d env. in a 1/z subdivition mapper.)
- (All this is for quake style render engine, not doom.)
-
- : The routine used to do (plain shaded) wall-mapping in TD takes 18 cycles/pixel
- : (030 cycles). The floor/ceiling mapper is not the best possible anymore (I've
- : seen a *usefull* trick presented here recently which I could have used in the
- : floor/ceiling mapper).
-
- 17 cycles on a 030 if you use segment. 15 cycles if you use pattern
- repetition... precalculating the steping is slower on 020/030.
-
- : SS> why would a 50mhz 060 be 6 time slower then a 40mhz 040 when working
- : SS> only in local fastmem?!?!!!?!??!!? (I assume here that you do gouraud
- : SS> shade your quads)
-
- : It DIDN'T work in local fastmem (did I say that?). This included C2P time. It
- : was run in 320x240 1x1, 8-bit, full floors, ceilings and walls in DOOM style.
-
- I bet that if you had a 300mhz 060 the c2p wont be any faster then the
- 25mhz 040... The 060 is not to blame.
-
- This is what I beleive the amiga as enought power even with a 25mhz 030 to
- do doom(but flat shaded tmaping) at ~15-20fps if it had a local bus video
- card.
-
- Stephan
-